This report full fills the request of politicians and managers of the Food Standards Agency performing the specific analysis
The data provided contains one row for each local authority in England, Wales and Northern Ireland and establishments within each local authority. Establishments within each local authority are rated for their potential impact on public health.
| Variable Name | Description |
|---|---|
| Country | The region where the local council is situated |
| LAType | A local authority’s category. |
| LAName | The regional government’s name. |
| Totalestablishments(includingnotyetrated&outside) | The total number of establishments, including those outside the programme and those whose intervention potential has not yet been determined. |
| Total%ofInterventionsachieved(premisesratedA-E) | The overall success rate for interventions for buildings with grades A through E. |
| Total%ofInterventionsachieved-premisesratedA | The aggregate success rate of interventions for locations with an A rating. |
| Total%ofInterventionsachieved-premisesratedB | The whole percentage of interventions completed for B-rated locations. |
| Total%ofInterventionsachieved-premisesratedC | The overall success rate of interventions for C-rated buildings. |
| Total%ofInterventionsachieved-premisesratedD | The overall success rate of interventions for D-rated establishments. |
| Total%ofInterventionsachieved-premisesratedE | The overall success rate of treatments for E-rated properties. |
| Aratedestablishments | The quantity of rated establishments A. |
| Bratedestablishments | The number of organizations with a B rating. |
| Cratedestablishments | The number of firms with a C rating. |
| Dratedestablishments | The volume of businesses with a D rating. |
| Eratedestablishments | The total number of businesses with an E rating. |
| ProfessionalFullTimeEquivalentPosts-occupied * | The quantity of professional full-time equivalent positions that are currently filled. |
#Import Data Set and checking Structure and Summary
df <- read_csv("2019-20-enforcement-data-food-hygiene.csv")
## Rows: 353 Columns: 36
## ── Column specification ────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country, LAType, LAName, Total%ofBroadlyCompliantestablishments-A, Total%ofInterventio...
## dbl (30): Totalestablishments(includingnotyetrated&outside), Establishmentsnotyetratedforinterve...
## num (1): TotalnumberofestablishmentssubjecttoWrittenwarnings
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
print(head(df))
## # A tibble: 6 × 36
## Country LAType LAName Total…¹ Estab…² Estab…³ Total…⁴ Total…⁵ Arate…⁶ Total…⁷ Brate…⁸ Total…⁹
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 England District C… Adur … 1478 24 0 97.2 95.6 3 33.33 39 69.2
## 2 England District C… Aller… 1316 29 74 97.2 94.9 2 50 26 76.9
## 3 England District C… Amber… 1112 1 0 97.5 97.4 2 50 39 64.1
## 4 England District C… Arun 1208 44 1 97.7 94.1 3 0 28 82.1
## 5 England District C… Ashfi… 905 26 1 96.7 93.9 1 0 31 77.4
## 6 England District C… Ashfo… 1132 0 0 98.6 98.6 5 20 15 66.7
## # … with 24 more variables: Cratedestablishments <dbl>,
## # `Total%ofBroadlyCompliantestablishments-C` <dbl>, Dratedestablishments <dbl>,
## # `Total%ofBroadlyCompliantestablishments-D` <dbl>, Eratedestablishments <dbl>,
## # `Total%ofBroadlyCompliantestablishments-E` <dbl>,
## # `Total%ofInterventionsachieved(premisesratedA-E)` <dbl>,
## # `Total%ofInterventionsachieved-premisesratedA` <chr>,
## # `Total%ofInterventionsachieved-premisesratedB` <dbl>, …
print(str(df))
## spc_tbl_ [353 × 36] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Country : chr [1:353] "England" "England" "England" "England" ...
## $ LAType : chr [1:353] "District Council" "District Council" "District Council" "District Council" ...
## $ LAName : chr [1:353] "Adur and Worthing" "Allerdale" "Amber Valley" "Arun" ...
## $ Totalestablishments(includingnotyetrated&outside) : num [1:353] 1478 1316 1112 1208 905 ...
## $ Establishmentsnotyetratedforintervention : num [1:353] 24 29 1 44 26 0 58 40 41 84 ...
## $ Establishmentsoutsidetheprogramme : num [1:353] 0 74 0 1 1 0 214 39 0 42 ...
## $ Total%ofBroadlyCompliantestablishmentsratedA-E : num [1:353] 97.2 97.2 97.5 97.7 96.7 ...
## $ Total%ofBroadlyCompliantestablishments(includingnotyetrated) : num [1:353] 95.6 94.9 97.4 94.1 93.9 ...
## $ Aratedestablishments : num [1:353] 3 2 2 3 1 5 1 4 1 4 ...
## $ Total%ofBroadlyCompliantestablishments-A : chr [1:353] "33.33" "50" "50" "0" ...
## $ Bratedestablishments : num [1:353] 39 26 39 28 31 15 20 44 31 36 ...
## $ Total%ofBroadlyCompliantestablishments-B : num [1:353] 69.2 76.9 64.1 82.1 77.4 ...
## $ Cratedestablishments : num [1:353] 227 243 179 211 145 125 270 219 96 190 ...
## $ Total%ofBroadlyCompliantestablishments-C : num [1:353] 91.2 90.1 93.8 94.3 89.7 ...
## $ Dratedestablishments : num [1:353] 592 469 432 483 353 453 555 626 186 519 ...
## $ Total%ofBroadlyCompliantestablishments-D : num [1:353] 99 99.4 99.5 98.5 98.3 ...
## $ Eratedestablishments : num [1:353] 593 473 459 438 348 534 628 1030 219 525 ...
## $ Total%ofBroadlyCompliantestablishments-E : num [1:353] 99.8 100 100 100 100 ...
## $ Total%ofInterventionsachieved(premisesratedA-E) : num [1:353] 96.1 90.6 88.9 94 80.7 ...
## $ Total%ofInterventionsachieved-premisesratedA : chr [1:353] "100" "100" "100" "100" ...
## $ Total%ofInterventionsachieved-premisesratedB : num [1:353] 100 98.3 95.1 96.3 100 ...
## $ Total%ofInterventionsachieved-premisesratedC : num [1:353] 95.5 89.7 97 94.4 78.8 ...
## $ Total%ofInterventionsachieved-premisesratedD : num [1:353] 96 93 91.8 92.6 85.3 ...
## $ Total%ofInterventionsachieved-premisesratedE : num [1:353] 94 85.1 72.3 95.5 68.3 ...
## $ Total%ofInterventionsachieved-premisesnotyetrated : num [1:353] 100 100 100 95.4 79.6 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Voluntaryclosure : num [1:353] 5 0 0 2 1 0 0 0 0 0 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Seizure,detention&surrenderoffood : num [1:353] 4 0 0 0 0 0 0 0 0 0 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Suspension/revocationofapprovalorlicence: num [1:353] 0 0 0 0 0 0 1 0 0 0 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneemergencyprohibitionnotice : num [1:353] 0 0 0 0 0 0 0 0 0 0 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Prohibitionorder : num [1:353] 0 0 0 0 0 0 1 0 0 0 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Simplecaution : num [1:353] 0 0 1 0 0 0 0 0 0 0 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneimprovementnotices : num [1:353] 3 6 11 3 4 0 3 2 1 2 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Remedialaction&detentionnotices : num [1:353] 0 0 0 0 0 0 0 0 0 0 ...
## $ TotalnumberofestablishmentssubjecttoWrittenwarnings : num [1:353] 323 413 515 386 252 224 223 152 179 175 ...
## $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Prosecutionsconcluded : num [1:353] 0 0 1 0 0 0 0 2 0 0 ...
## $ ProfessionalFullTimeEquivalentPosts-occupied * : num [1:353] 5 4 3.5 4 2 4.65 2.5 5 2 4.2 ...
## - attr(*, "spec")=
## .. cols(
## .. Country = col_character(),
## .. LAType = col_character(),
## .. LAName = col_character(),
## .. `Totalestablishments(includingnotyetrated&outside)` = col_double(),
## .. Establishmentsnotyetratedforintervention = col_double(),
## .. Establishmentsoutsidetheprogramme = col_double(),
## .. `Total%ofBroadlyCompliantestablishmentsratedA-E` = col_double(),
## .. `Total%ofBroadlyCompliantestablishments(includingnotyetrated)` = col_double(),
## .. Aratedestablishments = col_double(),
## .. `Total%ofBroadlyCompliantestablishments-A` = col_character(),
## .. Bratedestablishments = col_double(),
## .. `Total%ofBroadlyCompliantestablishments-B` = col_double(),
## .. Cratedestablishments = col_double(),
## .. `Total%ofBroadlyCompliantestablishments-C` = col_double(),
## .. Dratedestablishments = col_double(),
## .. `Total%ofBroadlyCompliantestablishments-D` = col_double(),
## .. Eratedestablishments = col_double(),
## .. `Total%ofBroadlyCompliantestablishments-E` = col_double(),
## .. `Total%ofInterventionsachieved(premisesratedA-E)` = col_double(),
## .. `Total%ofInterventionsachieved-premisesratedA` = col_character(),
## .. `Total%ofInterventionsachieved-premisesratedB` = col_double(),
## .. `Total%ofInterventionsachieved-premisesratedC` = col_double(),
## .. `Total%ofInterventionsachieved-premisesratedD` = col_double(),
## .. `Total%ofInterventionsachieved-premisesratedE` = col_double(),
## .. `Total%ofInterventionsachieved-premisesnotyetrated` = col_double(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Voluntaryclosure` = col_double(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Seizure,detention&surrenderoffood` = col_double(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Suspension/revocationofapprovalorlicence` = col_double(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneemergencyprohibitionnotice` = col_double(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Prohibitionorder` = col_double(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Simplecaution` = col_double(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneimprovementnotices` = col_double(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Remedialaction&detentionnotices` = col_double(),
## .. TotalnumberofestablishmentssubjecttoWrittenwarnings = col_number(),
## .. `Totalnumberofestablishmentssubjecttoformalenforcementactions-Prosecutionsconcluded` = col_double(),
## .. `ProfessionalFullTimeEquivalentPosts-occupied *` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
## NULL
print(summary(df))
## Country LAType LAName
## Length:353 Length:353 Length:353
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Totalestablishments(includingnotyetrated&outside) Establishmentsnotyetratedforintervention
## Min. : 145.0 Min. : 0.00
## 1st Qu.: 920.5 1st Qu.: 25.00
## Median :1330.0 Median : 49.00
## Mean :1620.7 Mean : 89.75
## 3rd Qu.:2004.5 3rd Qu.: 100.00
## Max. :9277.0 Max. :1744.00
## NA's :6 NA's :6
## Establishmentsoutsidetheprogramme Total%ofBroadlyCompliantestablishmentsratedA-E
## Min. : 0.00 Min. : 74.61
## 1st Qu.: 0.00 1st Qu.: 95.37
## Median : 2.00 Median : 97.13
## Mean : 49.62 Mean : 96.33
## 3rd Qu.: 39.00 3rd Qu.: 98.19
## Max. :865.00 Max. :100.00
## NA's :6 NA's :6
## Total%ofBroadlyCompliantestablishments(includingnotyetrated) Aratedestablishments
## Min. :69.45 Min. : 0.000
## 1st Qu.:89.23 1st Qu.: 1.000
## Median :92.80 Median : 2.000
## Mean :91.54 Mean : 4.285
## 3rd Qu.:95.16 3rd Qu.: 5.000
## Max. :99.87 Max. :72.000
## NA's :6 NA's :6
## Total%ofBroadlyCompliantestablishments-A Bratedestablishments
## Length:353 Min. : 2.00
## Class :character 1st Qu.: 23.00
## Mode :character Median : 39.00
## Mean : 55.61
## 3rd Qu.: 68.50
## Max. :516.00
## NA's :6
## Total%ofBroadlyCompliantestablishments-B Cratedestablishments
## Min. : 5.88 Min. : 18.0
## 1st Qu.: 55.62 1st Qu.: 144.0
## Median : 69.23 Median : 225.0
## Mean : 67.34 Mean : 302.6
## 3rd Qu.: 80.15 3rd Qu.: 376.0
## Max. :100.00 Max. :1647.0
## NA's :6 NA's :6
## Total%ofBroadlyCompliantestablishments-C Dratedestablishments
## Min. : 71.96 Min. : 56.0
## 1st Qu.: 89.36 1st Qu.: 307.0
## Median : 92.86 Median : 462.0
## Mean : 92.02 Mean : 553.6
## 3rd Qu.: 95.50 3rd Qu.: 682.0
## Max. :100.00 Max. :3053.0
## NA's :6 NA's :6
## Total%ofBroadlyCompliantestablishments-D Eratedestablishments
## Min. : 75.74 Min. : 63.0
## 1st Qu.: 97.94 1st Qu.: 362.5
## Median : 99.03 Median : 480.0
## Mean : 98.38 Mean : 565.3
## 3rd Qu.: 99.60 3rd Qu.: 667.5
## Max. :100.00 Max. :4309.0
## NA's :6 NA's :6
## Total%ofBroadlyCompliantestablishments-E Total%ofInterventionsachieved(premisesratedA-E)
## Min. : 79.22 Min. : 20.64
## 1st Qu.: 99.83 1st Qu.: 81.81
## Median :100.00 Median : 90.82
## Mean : 99.82 Mean : 86.62
## 3rd Qu.:100.00 3rd Qu.: 95.39
## Max. :100.00 Max. :100.00
## NA's :6 NA's :6
## Total%ofInterventionsachieved-premisesratedA Total%ofInterventionsachieved-premisesratedB
## Length:353 Min. : 50.00
## Class :character 1st Qu.: 93.53
## Mode :character Median : 97.75
## Mean : 95.25
## 3rd Qu.:100.00
## Max. :100.00
## NA's :6
## Total%ofInterventionsachieved-premisesratedC Total%ofInterventionsachieved-premisesratedD
## Min. : 18.37 Min. : 19.77
## 1st Qu.: 89.27 1st Qu.: 82.00
## Median : 94.97 Median : 91.72
## Mean : 91.84 Mean : 86.32
## 3rd Qu.: 97.77 3rd Qu.: 95.98
## Max. :100.00 Max. :100.00
## NA's :6 NA's :6
## Total%ofInterventionsachieved-premisesratedE Total%ofInterventionsachieved-premisesnotyetrated
## Min. : 1.81 Min. : 6.56
## 1st Qu.: 65.39 1st Qu.: 85.81
## Median : 87.50 Median : 99.75
## Mean : 77.37 Mean : 90.95
## 3rd Qu.: 95.90 3rd Qu.:100.00
## Max. :100.00 Max. :100.00
## NA's :6 NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Voluntaryclosure
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 1.000
## Mean : 2.712
## 3rd Qu.: 3.000
## Max. :62.000
## NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Seizure,detention&surrenderoffood
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 1.199
## 3rd Qu.: 1.000
## Max. :52.000
## NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Suspension/revocationofapprovalorlicence
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.06916
## 3rd Qu.:0.00000
## Max. :7.00000
## NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneemergencyprohibitionnotice
## Min. : 0.0000
## 1st Qu.: 0.0000
## Median : 0.0000
## Mean : 0.7118
## 3rd Qu.: 0.0000
## Max. :42.0000
## NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Prohibitionorder
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1354
## 3rd Qu.:0.0000
## Max. :6.0000
## NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Simplecaution
## Min. : 0.0000
## 1st Qu.: 0.0000
## Median : 0.0000
## Mean : 0.4409
## 3rd Qu.: 0.0000
## Max. :14.0000
## NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneimprovementnotices
## Min. : 0.000
## 1st Qu.: 1.000
## Median : 4.000
## Mean : 7.527
## 3rd Qu.: 9.000
## Max. :77.000
## NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Remedialaction&detentionnotices
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.3314
## 3rd Qu.:0.0000
## Max. :9.0000
## NA's :6
## TotalnumberofestablishmentssubjecttoWrittenwarnings
## Min. : 30.0
## 1st Qu.: 195.0
## Median : 340.0
## Mean : 437.0
## 3rd Qu.: 543.5
## Max. :3061.0
## NA's :6
## Totalnumberofestablishmentssubjecttoformalenforcementactions-Prosecutionsconcluded
## Min. : 0.0000
## 1st Qu.: 0.0000
## Median : 0.0000
## Mean : 0.6657
## 3rd Qu.: 1.0000
## Max. :25.0000
## NA's :6
## ProfessionalFullTimeEquivalentPosts-occupied *
## Min. : 0.65
## 1st Qu.: 2.50
## Median : 3.41
## Mean : 4.10
## 3rd Qu.: 5.00
## Max. :22.13
## NA's :6
The Data contains the values NP and NR, respectively, which means (as per the documentation for Food Hygiene Data - Supporting Notes): No premises were supplied at this risk level(NP). No interventions are due or reported(NR). The values are type converted to a numeric datatype and replaced with the proper numerical values of 0 for NP and 100 for NR
The local authority of England contains 6 rows with empty fields that were eliminated because they had no data in any of the fields.
#Type Casting
df$`Total%ofInterventionsachieved-premisesratedA`[df$`Total%ofInterventionsachieved-premisesratedA` == "NR"] = "100"
df$`Total%ofInterventionsachieved-premisesratedA` <- as.numeric(df$`Total%ofInterventionsachieved-premisesratedA`)
df$`Total%ofBroadlyCompliantestablishments-A`[df$`Total%ofBroadlyCompliantestablishments-A` == "NP"] = "0"
df$`Total%ofBroadlyCompliantestablishments-A` <- as.numeric(df$`Total%ofBroadlyCompliantestablishments-A`)
#Remove Null Charecters
df <- na.omit(df)
## # A tibble: 6 × 7
## # Groups: Country [3]
## Country LAType sumA sumB sumC sumD sumE
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 England District Council 403 5990 32456 73627 80226
## 2 England London Borough 398 3809 18087 27932 19161
## 3 England Metropolitan Borough Council 323 3744 19142 34441 31933
## 4 England Unitary Authority 241 3954 19246 43951 43555
## 5 Northern Ireland NI Unitary Authority 18 492 3529 6563 7799
## 6 Wales Welsh Unitary Authority 104 1307 12546 5574 13477
England has the Highest number of Establishments and total number of establishments in each Ratings is given above.
# Plot Total Establishments under Different Country
plot1 <- ggplot(df, aes(`Total%ofBroadlyCompliantestablishmentsratedA-E`))+geom_histogram(aes(fill= Country), colour="black", binwidth = 0.3)+labs(x="Total % of BroadlyCompliant Establishments Rated A-E" , title ="Distribution of Establishments under Broadly Complaint")
ggplotly(plot1)
Graph Gives Information about the Distribution of Establishments Rated Under Broadly Complaint. Most of the Establishments around 90% to 100% in each local Authority are rated according to Broadly Compliant. Establishments Under 4 Local Authority of England are 100% rated. Few Establishments under 11 Local Authority are not rated as Broadly Compliant establishments about 70-90%.
# Data of types of formal enforcement actions across different LATypes.
D3 <- df
D3$Country <- as.factor(D3$Country)
D3$LAType <- as.factor(D3$LAType)
Inve <- D3 %>% group_by(Country,LAType)%>%
dplyr::summarise( Voluntaryclosure = sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Voluntaryclosure`), Seizure = sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Seizure,detention&surrenderoffood`),
Suspension = sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Suspension/revocationofapprovalorlicence`),
Hygieneemergencyprohibitionnotice=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneemergencyprohibitionnotice`),
Prohibitionorder=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Prohibitionorder`),
Simplecaution=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Simplecaution`),
Hygieneimprovementnotices=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneimprovementnotices`),
Remedialaction=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Remedialaction&detentionnotices`),
Writtenwarnings=sum( `TotalnumberofestablishmentssubjecttoWrittenwarnings`),
Prosecutionsconcluded=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Prosecutionsconcluded`)
)
## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
print(Inve)
## # A tibble: 6 × 12
## # Groups: Country [3]
## Country LAType Volun…¹ Seizure Suspe…² Hygie…³ Prohi…⁴ Simpl…⁵ Hygie…⁶ Remed…⁷ Writt…⁸ Prose…⁹
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 England Distr… 196 63 1 25 12 51 727 17 48305 42
## 2 England Londo… 167 56 7 108 15 20 612 12 22475 77
## 3 England Metro… 272 41 2 64 7 36 555 20 30279 48
## 4 England Unita… 188 197 11 46 13 26 508 8 31370 30
## 5 Northern I… NI Un… 20 19 0 0 0 2 14 8 6747 1
## 6 Wales Welsh… 98 40 3 4 0 18 196 50 12454 33
## # … with abbreviated variable names ¹Voluntaryclosure, ²Suspension,
## # ³Hygieneemergencyprohibitionnotice, ⁴Prohibitionorder, ⁵Simplecaution,
## # ⁶Hygieneimprovementnotices, ⁷Remedialaction, ⁸Writtenwarnings, ⁹Prosecutionsconcluded
Above Table Provides Information on Total number of establishments subject to formal enforcement actions 151630 Establishments have receive Written Warnings in all Local Authorities and 24 Establisments are subjected to Suspension/Revocation of approval or licence. Establishments of Local Authority of Northern Ireland have 0 establishments subjected to Suspension or Hygiene Emergency Prohibition notice or Prohibition Order
# Plots for Different Establishments Rated A-E
dA <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedA`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
labs(x="% of Interventions Achieved (Rated A)") +
theme(legend.position = 'hidden')
dB <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedB`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
labs(x="% of Interventions Achieved (Rated B)") +
theme(legend.position = 'hidden')
dC <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedC`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
labs(x="% of Interventions Achieved (Rated C)") +
theme(legend.position = 'hidden')
dD <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedD`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
labs(x="% of Interventions Achieved (Rated D)") +
theme(legend.position = 'hidden')
dE <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedE`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
labs(x="% of Interventions Achieved (Rated E)") +
theme(legend.position = 'hidden')
ggarrange(dA,dB,dC,dD,dE,common.legend = TRUE,top = "Individual Impact Level(A-E) Distribution")
Similar left skew is obtained for All premises rated (A-E) and most of
the Interventions are achieved in each rated establishments.
x <- rcorr(as.matrix(select(df, `Total%ofInterventionsachieved(premisesratedA-E)`, `ProfessionalFullTimeEquivalentPosts-occupied *`)))
print(x)
## Total%ofInterventionsachieved(premisesratedA-E)
## Total%ofInterventionsachieved(premisesratedA-E) 1.00
## ProfessionalFullTimeEquivalentPosts-occupied * -0.02
## ProfessionalFullTimeEquivalentPosts-occupied *
## Total%ofInterventionsachieved(premisesratedA-E) -0.02
## ProfessionalFullTimeEquivalentPosts-occupied * 1.00
##
## n= 347
##
##
## P
## Total%ofInterventionsachieved(premisesratedA-E)
## Total%ofInterventionsachieved(premisesratedA-E)
## ProfessionalFullTimeEquivalentPosts-occupied * 0.6552
## ProfessionalFullTimeEquivalentPosts-occupied *
## Total%ofInterventionsachieved(premisesratedA-E) 0.6552
## ProfessionalFullTimeEquivalentPosts-occupied *
The Correlation between the Quantity of Interventions and FTE employees overall is less to -0.02 and we could also see the Pvalue > 0.05 which is very high, indicating insignificance over all.
ggplot(df, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=`ProfessionalFullTimeEquivalentPosts-occupied *`)) + geom_point() + geom_smooth() + labs(y="Employees", x="Successful Interventions")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The straight line Indicates there is no big correlation between
Employees and Successful Interventions.
# Linear Regression
model_overall <- lm(`Total%ofInterventionsachieved(premisesratedA-E)`~`ProfessionalFullTimeEquivalentPosts-occupied *`, data = df)
p value = 0.6552 > 0.005 indicating insignificance, R square is 0.0005787 which shows very bad variation between dependent variable and independent variable. The coefficient estimate as intercept = 87.1091 indicating for 0 employees 87% of successful responses are obtained and Coefficient of slope = -0.1195 indicating for every employee the percentage of successful responses is reduced by 0.12 which is insignificant practically.
#output of Linear Regression
summary(model_overall)
##
## Call:
## lm(formula = `Total%ofInterventionsachieved(premisesratedA-E)` ~
## `ProfessionalFullTimeEquivalentPosts-occupied *`, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.304 -4.575 4.067 8.658 13.860
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 87.1091 1.2828 67.905 <2e-16 ***
## `ProfessionalFullTimeEquivalentPosts-occupied *` -0.1195 0.2675 -0.447 0.655
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.4 on 345 degrees of freedom
## Multiple R-squared: 0.0005787, Adjusted R-squared: -0.002318
## F-statistic: 0.1998 on 1 and 345 DF, p-value: 0.6552
cbind(coefficient=coef(model_overall), confint(model_overall))
## coefficient 2.5 % 97.5 %
## (Intercept) 87.1091495 84.5860343 89.6322647
## `ProfessionalFullTimeEquivalentPosts-occupied *` -0.1195469 -0.6456029 0.4065092
The results suggest that there is an average decrease of 0.12 interventions for every 1% increase of employee . The confidence intervals include zero (95% CI = [-0.645, 0.406]) and this decrease is not significantly different from zero, t(347)= -0.47, p=0.65
This report was created to aid politicians and management of the Food Standards Agency in understanding how businesses in different local governments in the United Kingdom responded to intervention measures put on them. The collection initially contains 353 establishments from three distinct nations (England, Wales, and Northern Ireland). There are six separate local authority classifications in these three nations, including Welsh Unitary Authority, NI Unitary Authority, London Borough, Metropolitan Borough Council, and District Council (DC) (WUA). 38 information columns about each establishment were also included. Six rows with NA values were eliminated from the data after cleaning, leaving 347 businesses for study. The distribution of the percentage of successful enforcement actions across Local
The Distribution of Individual Impact on successful interventions is shown in the figure below.
All the Graphs indicate most of the Intervention are achieved in all Ranked in different LATypes. More than 300 Establishments Rated A and achieved 90% of their Interventions including all LA Types. There are more Establishments Rated E which have achieved less than 75% of interventions compared to other Rated establishments. Most Establishments Under all LATypes Rated A-D have achieved more than 75% of the Interventions.
This is a report for a manager of a publishing company with specific analysis.
The data provided contains information on e-book sales over a period of many months. Each row in the data represents one book. The values of the variables are taken across the entire time period, so daily.sales is the average number of sales (minus refunds) across all days in the period, and sale.price is the average price for which the book sold across all sales in the period.
| Variable Name | Description |
|---|---|
| Genre | The book’s Genre |
| avg.review | Average Reviews of E-Books. |
| daily.sales | average number of sales across all days in the period. |
| total.reviews | The total amount of ebook reviews. |
| sale.price | average price for which the book sold across all sales in the period. |
bd <- read_csv("publisher_sales.csv")
## Rows: 6000 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): sold by, publisher.type, genre
## dbl (4): avg.review, daily.sales, total.reviews, sale.price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bd)
## spc_tbl_ [6,000 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ sold by : chr [1:6000] "Random House LLC" "Amazon Digital Services, Inc." "Amazon Digital Services, Inc." "Amazon Digital Services, Inc." ...
## $ publisher.type: chr [1:6000] "big five" "indie" "small/medium" "small/medium" ...
## $ genre : chr [1:6000] "childrens" "non_fiction" "non_fiction" "fiction" ...
## $ avg.review : num [1:6000] 4.44 4.19 3.71 4.72 4.65 4.81 4.33 4.21 3.95 4.66 ...
## $ daily.sales : num [1:6000] 61.5 74.9 66 85.2 37.7 ...
## $ total.reviews : num [1:6000] 92 130 118 179 111 106 205 86 161 81 ...
## $ sale.price : num [1:6000] 8.03 9.08 9.48 12.32 5.78 ...
## - attr(*, "spec")=
## .. cols(
## .. `sold by` = col_character(),
## .. publisher.type = col_character(),
## .. genre = col_character(),
## .. avg.review = col_double(),
## .. daily.sales = col_double(),
## .. total.reviews = col_double(),
## .. sale.price = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
#Genre should be of factor datatype instead of character
bd$genre <- as.factor(bd$genre)
levels(bd$genre)
## [1] "childrens" "fiction" "non_fiction"
summary(bd)
## sold by publisher.type genre avg.review daily.sales
## Length:6000 Length:6000 childrens :2000 Min. :0.000 Min. : -0.53
## Class :character Class :character fiction :2000 1st Qu.:4.100 1st Qu.: 56.77
## Mode :character Mode :character non_fiction:2000 Median :4.400 Median : 74.29
## Mean :4.267 Mean : 79.11
## 3rd Qu.:4.620 3rd Qu.: 98.02
## Max. :4.980 Max. :207.98
## total.reviews sale.price
## Min. : 0.0 Min. : 0.740
## 1st Qu.:105.0 1st Qu.: 7.140
## Median :128.0 Median : 8.630
## Mean :132.6 Mean : 8.641
## 3rd Qu.:163.0 3rd Qu.:10.160
## Max. :248.0 Max. :17.460
We can see Negaive values in daily sales average which is not possible thus we can remove the values less than zero. Total and Average Reviews there are few books with 0 reviews.
#Omitting negative values
bd <- bd %>% filter(daily.sales > 0)
summary(bd)
## sold by publisher.type genre avg.review daily.sales
## Length:5999 Length:5999 childrens :2000 Min. :0.000 Min. : 3.49
## Class :character Class :character fiction :2000 1st Qu.:4.100 1st Qu.: 56.78
## Mode :character Mode :character non_fiction:1999 Median :4.400 Median : 74.30
## Mean :4.267 Mean : 79.12
## 3rd Qu.:4.620 3rd Qu.: 98.02
## Max. :4.980 Max. :207.98
## total.reviews sale.price
## Min. : 0.0 Min. : 0.740
## 1st Qu.:105.0 1st Qu.: 7.140
## Median :128.0 Median : 8.630
## Mean :132.6 Mean : 8.641
## 3rd Qu.:163.0 3rd Qu.:10.160
## Max. :248.0 Max. :17.460
# Histogram for the sales of the books
ggplot(data = bd, aes(x = sale.price)) + geom_histogram(binwidth = 0.2)
# Average reviews for different sorts of publishers are displayed.
ggplot(data = bd, aes(x = avg.review, fill = publisher.type,alpha = 0.1)) + geom_histogram(binwidth = 0.1,position = 'dodge')
Comparatively, the average review for books sold by small- and medium-sized publishers is excellent. Additionally, several of the reviews for the plot were zero.
Sales.Genre <- bd %>% group_by(genre) %>%
summarise(average.sales = mean(daily.sales))
ggplot(Sales.Genre, aes(x=genre, y=average.sales)) +
geom_bar(stat = "identity") +
geom_text(aes(label = average.sales), vjust = -0.2) +
labs(x="Genre of Books", y="Avg Sales", title = "Average e-Book Sales for each Genre ")
Fiction has the Highest Average Daily Sales among different Genre.
The daily sales varies depending on the Genre of books. Fiction Genre books are sold the most follwed by Non_fiction and Children respectively.
rcorr(as.matrix(bd %>% select(avg.review, daily.sales, total.reviews, sale.price)))
## avg.review daily.sales total.reviews sale.price
## avg.review 1.00 -0.01 0.10 -0.02
## daily.sales -0.01 1.00 0.66 -0.28
## total.reviews 0.10 0.66 1.00 -0.26
## sale.price -0.02 -0.28 -0.26 1.00
##
## n= 5999
##
##
## P
## avg.review daily.sales total.reviews sale.price
## avg.review 0.6862 0.0000 0.2450
## daily.sales 0.6862 0.0000 0.0000
## total.reviews 0.0000 0.0000 0.0000
## sale.price 0.2450 0.0000 0.0000
grid.arrange(
ggplot(bd, aes(x=daily.sales, y=avg.review)) + geom_point() + geom_smooth() + labs(y="Average Review", x="Daily Sales"),
ggplot(bd, aes(x=daily.sales, y=total.reviews)) + geom_point() + geom_smooth() + labs(y="Total Review", x="Daily Sales"),
ggplot(bd, aes(x=avg.review, y=total.reviews)) + geom_point() + geom_smooth() + labs(y="Total Review", x="Average Review"),
nrow=3, top="Necessary Correlations")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
There is no Big correlation between Avg reviews and Daily Sales as well as Average Review and Total Review. Therefore Average Review and Total Review are both independent variables of each other. There also Exists a positive correlation between Total Review and Daily Sales.
# Creating a correlation data frame
cor_avd <- cor(bd$avg.review,bd$daily.sales)
cor_tvd <- cor(bd$total.reviews,bd$daily.sales)
cor_avt <- cor(bd$total.reviews,bd$avg.review)
print(data.frame(cor_avd,cor_tvd,cor_avt))
## cor_avd cor_tvd cor_avt
## 1 -0.00521738 0.6638385 0.1044134
Correlation Between Total Review and Daily sales = 0.66 which is a positive correlation.
#Performing Multiple Linear Regression to predict the daily sales based on average and total reviews
m1 <- lm(daily.sales ~ avg.review + total.reviews, data = bd)
summary(m1)
##
## Call:
## lm(formula = daily.sales ~ avg.review + total.reviews, data = bd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.407 -14.656 -1.071 13.672 122.177
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.123430 2.340120 10.309 < 2e-16 ***
## avg.review -3.999637 0.512874 -7.798 7.34e-15 ***
## total.reviews 0.543327 0.007816 69.517 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.58 on 5996 degrees of freedom
## Multiple R-squared: 0.4463, Adjusted R-squared: 0.4461
## F-statistic: 2416 on 2 and 5996 DF, p-value: < 2.2e-16
cbind(coef(m1),confint(m1))
## 2.5 % 97.5 %
## (Intercept) 24.123430 19.5359532 28.7109077
## avg.review -3.999637 -5.0050539 -2.9942200
## total.reviews 0.543327 0.5280054 0.5586487
When estimating the effect of both Total Review and Average Review in the same regression we find that when controlling for other variables, a 1 unit increase in total review predicts 0.543 additional sales (t(5996) = 69.51, p<0.001, 95% CI [0.53, 0.56]) and an increase in average review by 1 unit predicts a decrease in daily sales of 3.99 (t(5996) = -7.8, p<0.001, 95% CI [-5, -2.99])
Since we have zero avg and total reviews in the data we will form another model without those.
#Performing Multiple Linear Regression to predict the daily sales based on average and total reviews with removing 0 reviews
bd_null <- bd %>% filter(total.reviews != 0)
m2 <- lm(daily.sales ~ avg.review + total.reviews, data = bd_null)
summary(m2)
##
## Call:
## lm(formula = daily.sales ~ avg.review + total.reviews, data = bd_null)
##
## Residuals:
## Min 1Q Median 3Q Max
## -104.341 -14.628 -0.752 13.829 93.489
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.476624 2.650690 2.066 0.0389 *
## avg.review -0.366436 0.565700 -0.648 0.5172
## total.reviews 0.564835 0.007831 72.127 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.2 on 5973 degrees of freedom
## Multiple R-squared: 0.4655, Adjusted R-squared: 0.4653
## F-statistic: 2601 on 2 and 5973 DF, p-value: < 2.2e-16
cbind(coef(m2),confint(m2))
## 2.5 % 97.5 %
## (Intercept) 5.4766239 0.2803136 10.6729342
## avg.review -0.3664363 -1.4754132 0.7425405
## total.reviews 0.5648349 0.5494831 0.5801868
The rate of daily sales decreases by 0.366 between -1.47 and 0.742 when the average review increases by 1% Daily sales increases by 0.56, 95% CI [0.54, 0.58] for every total review
#Correlation between number of sales and sale price
rcorr(as.matrix(select(bd, sale.price, daily.sales)))
## sale.price daily.sales
## sale.price 1.00 -0.28
## daily.sales -0.28 1.00
##
## n= 5999
##
##
## P
## sale.price daily.sales
## sale.price 0
## daily.sales 0
ggplot(data = bd, aes(x = daily.sales, y = sale.price)) + geom_point() + geom_smooth(method = lm) + labs(x= "Daily Sales", y = "Sales Price",title = expression(r == -0.28))
## `geom_smooth()` using formula = 'y ~ x'
There is a weak negative correlation with rvalue = -0.28 between daily sales and sales price which is significant p<0.05
# Sales Price VS Daily Sales Graph
ggplot(data = bd, aes(x = sale.price, y = daily.sales, color = genre)) + geom_point(alpha = 0.1) + geom_smooth(method = lm)
## `geom_smooth()` using formula = 'y ~ x'
#Different genres data frame
c <- bd %>% filter(genre == 'childrens')
f <- bd %>% filter(genre == 'fiction')
nf <- bd %>% filter(genre == 'non_fiction')
k1 <- ggplot(data = c, aes(x = daily.sales, y = sale.price)) + geom_point() + geom_smooth(method = lm) + labs(x= "Daily Sales", y = "Sales Price", title ='Children')
k2 <- ggplot(data =f, aes(x = daily.sales, y = sale.price)) + geom_point() + geom_smooth(method = lm) + labs(x= "Daily Sales", y = "Sales Price",title = "Fiction")
k3 <- ggplot(data = nf, aes(x = daily.sales, y = sale.price)) + geom_point() + geom_smooth(method = lm) + labs(x= "Daily Sales", y = "Sales Price",title="Non-Fiction")
grid.arrange(k1,k2,k3, ncol=3)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
There exists a Negative correlation between genre Children and daily sales. No strong correlation exists for the other genres.
#Simple model with just daily.sales as a function of sale.price
ds_sp <- lm(daily.sales ~ sale.price, data = bd)
#Output of Model
print(summary(ds_sp))
##
## Call:
## lm(formula = daily.sales ~ sale.price, data = bd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -80.760 -20.644 -4.638 17.084 130.301
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 112.0540 1.5201 73.72 <2e-16 ***
## sale.price -3.8110 0.1704 -22.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29.15 on 5997 degrees of freedom
## Multiple R-squared: 0.07696, Adjusted R-squared: 0.0768
## F-statistic: 500 on 1 and 5997 DF, p-value: < 2.2e-16
print(cbind(coef(ds_sp),confint(ds_sp)))
## 2.5 % 97.5 %
## (Intercept) 112.054023 109.074077 115.033968
## sale.price -3.810984 -4.145101 -3.476867
print(( ds_sp_emm <- emmeans(ds_sp, ~sale.price) ))
## sale.price emmean SE df lower.CL upper.CL
## 8.64 79.1 0.376 5997 78.4 79.9
##
## Confidence level used: 0.95
The estimate value for sale.price is negative, so the daily sales will decrease by 3.81 when sale price increases by 1 percent.
#Model including genre
g1 <- lm(daily.sales ~ sale.price + genre, data = bd)
( g1.emm <- emmeans(g1, ~sale.price) )
## sale.price emmean SE df lower.CL upper.CL
## 8.64 79.1 0.286 5995 78.6 79.7
##
## Results are averaged over the levels of: genre
## Confidence level used: 0.95
#ANOVA method to compare models
anova(ds_sp, g1)
## Analysis of Variance Table
##
## Model 1: daily.sales ~ sale.price
## Model 2: daily.sales ~ sale.price + genre
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 5997 5097347
## 2 5995 2944127 2 2153220 2192.3 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Interaction Terms
int <- lm(daily.sales ~ sale.price * genre, data = bd)
summary(int)
##
## Call:
## lm(formula = daily.sales ~ sale.price * genre, data = bd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -102.383 -13.374 0.018 13.042 102.366
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.8781 2.5003 29.147 < 2e-16 ***
## sale.price -1.7319 0.2453 -7.059 1.87e-12 ***
## genrefiction 35.1993 3.2711 10.761 < 2e-16 ***
## genrenon_fiction 6.3974 3.2015 1.998 0.045736 *
## sale.price:genrefiction 1.4587 0.3543 4.118 3.88e-05 ***
## sale.price:genrenon_fiction 1.3057 0.3467 3.766 0.000167 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.13 on 5993 degrees of freedom
## Multiple R-squared: 0.4687, Adjusted R-squared: 0.4683
## F-statistic: 1057 on 5 and 5993 DF, p-value: < 2.2e-16
anova(ds_sp, g1,int)
## Analysis of Variance Table
##
## Model 1: daily.sales ~ sale.price
## Model 2: daily.sales ~ sale.price + genre
## Model 3: daily.sales ~ sale.price * genre
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 5997 5097347
## 2 5995 2944127 2 2153220 2199.194 < 2.2e-16 ***
## 3 5993 2933858 2 10269 10.489 2.836e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When forecasting daily.sales, there is a substantial positive interaction between sale.price and genre, as indicated by the sale.price:genre beta coefficient. This implies that the slope of genre is steeper when the value of the sale.price is higher.
The purpose of this report is to help the publishing company’s managers comprehend the progress in e-book sales. The initial set of data included 6000 e-books and seven variables that tracked e-book sales over time.
E-books from all three major categories— children’s, fiction, and nonfiction—are included in the data. Below are the average sales for each of these genres:
## # A tibble: 3 × 2
## genre average.sales
## <fct> <dbl>
## 1 childrens 55.6
## 2 fiction 106.
## 3 non_fiction 75.9
The average sales for all books during this time period are 79.1, and the three categories significantly deviate from this value.
The sales of different Genre books have no much similarity on the daily sales. The Fiction(105.88) collection as more daily sales when compared to others followed by Non-fiction(75.9) and Children(55.77) respectively.
The Graph below shows the correlation between Average Reviews, Total Reviews and Daily Sales
There is Small positive correlation of daily sales with Total Review. But there is no big correlation between the other variables. This indicates the increase in total review increases the daily sales.
Thus for further relation considering the multiple linera regression model
##
## Call:
## lm(formula = daily.sales ~ avg.review + total.reviews, data = bd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.407 -14.656 -1.071 13.672 122.177
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.123430 2.340120 10.309 < 2e-16 ***
## avg.review -3.999637 0.512874 -7.798 7.34e-15 ***
## total.reviews 0.543327 0.007816 69.517 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.58 on 5996 degrees of freedom
## Multiple R-squared: 0.4463, Adjusted R-squared: 0.4461
## F-statistic: 2416 on 2 and 5996 DF, p-value: < 2.2e-16
## 2.5 % 97.5 %
## (Intercept) 24.123430 19.5359532 28.7109077
## avg.review -3.999637 -5.0050539 -2.9942200
## total.reviews 0.543327 0.5280054 0.5586487
This depicts that more books are sold depending upon the Total reviews (i.e) When the count of total review increase by 1 there is 0.54percent increase in the daily sales. Where as the Average review decreases the percentage of daily sales.
The Correlation Graph between Daily sales and Sales price are as
given below
The weak negative correlation is seen indicating when there is a decrease in daily sales there increases the sales price. Further we have made the analysis with the help of Linear regression and estimation to be more particular
##
## Call:
## lm(formula = daily.sales ~ sale.price, data = bd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -80.760 -20.644 -4.638 17.084 130.301
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 112.0540 1.5201 73.72 <2e-16 ***
## sale.price -3.8110 0.1704 -22.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29.15 on 5997 degrees of freedom
## Multiple R-squared: 0.07696, Adjusted R-squared: 0.0768
## F-statistic: 500 on 1 and 5997 DF, p-value: < 2.2e-16
## 2.5 % 97.5 %
## (Intercept) 112.054023 109.074077 115.033968
## sale.price -3.810984 -4.145101 -3.476867
## sale.price emmean SE df lower.CL upper.CL
## 8.64 79.1 0.376 5997 78.4 79.9
##
## Confidence level used: 0.95
Since the estimate value for sale.price is negative, a 1% increase in selling price will result in a 3.81 loss in daily sales.
Correlation graph for Sales price with respect to Genres are as shown
below:
To be precise we can show the correlation individually:
There is no stronger correlation between the sales price and daily sales with respect to Genres Fiction and Non Fiction But there is a negative correlation with respect to children genre (i.e) the sales price decreases with the increase in daily sales.
Furthermore,between the impact of sales price on daily sales varies with genres since the interaction among st sales price and genre has a reasonably high significant value when compared to the different genres.